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ABSTRACT 

We present two projection techniques for computing approximate solutions to linear sys- 
tems of the form Ax n = b n , for a sequence n= 1,2, e.g., such as arises from time discretiza- 
tion of a partial differential equation. The inexpensive approximate solutions can be used 
as initial guesses for iterative solution of the system, resulting in significantly reduced com- 
putational expense. Examples of two- and three-dimensional incompressible Navier-Stokes 
calculations are presented in which x represents the pressure, and A is a discrete Poisson 
operator. In flows containing significant dynamic activity, these projection techniques lead 
to as much as a two-fold reduction in solution time. 
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1 Introduction 

We consider iterative solution of a sequence of linear problems having the form: 

V n : Ax n = b n , n = { 1,2,...} (1) 

where A is an m x m matrix, and x n is assumed to be a solution which is evolving with 
some parameter, e.g., time, in the case where (1) represents an implicit substep in numerical 
solution of a time dependent partial differential equation. When A is sufficiently sparse and 
not amenable to eigenfunction decomposition techniques (e.g., fast Poisson solvers), iterative 
methods are generally preferable to direct factorizations, both from the standpoint of storage 
and operation count. For the particular class of problems defined by (1), direct methods 
benefit from amortization of the one-time cost of matrix-factorization. However, they derive 
no benefit from the fact that successive problems might have very similar solutions, whereas 
iterative methods can exploit this possibilty as a good iniitial guess can lead to a significant 
reduction in the number of iterations required to bring the residual to within the specified 
tolerance. 

The idea of using information generated from previous right-hand sides to speed iterative 
solution processes is not new. It is of course standard to solve only for the change in the 
solution, Ax n = x n — x n ~ l (e.g., [1]). For Krylov based methods, more substantial gains can 
potentially be attained if the residual vectors spanning = {b n ~ l ^ 46 n T } 

are retained, and b n is projected onto this space, e.g., as presented by Saad [2] for a Lanczos 
method and Van der Vorst [3] for the conjugate gradient method. The drawback of these 
techniques is that the required basis set, K™ £ 7£ mxt+1 , might be quite large, and that there 
is no clear way to continue or update the set of saved vectors for a continuing sequence of 
right-hand sides. 

fn this paper we present two techniques for extracting information from previous prob- 
lems, Vk, n — l<k<n — 1, to generate initial guesses to the current problem, V n . The 
first approach is to simply remove any component of 6 n for which the solution is already 
known, by projecting 6 n onto the set of vectors {fe n *, . . . , 6 n ! } having associated solutions 
{ x n ~ *, . . . , x n ~ l }, and to solve the problem corresponding to the component of b n orthogonal 
to span{6 n_/ , , . . , 6 n_1 }. The second approach is a refinement of the first, which seeks the 
best approximation to x n in span{x n ~ l , . . . , x 71 ” 1 } with respect to a norm tailored to the 
convergence properties of the conjugate gradient method for the case when A is symmetric 
positive definite. These procedures are superior to those derived from extrapolation (e.g., 
based on high-order interpolants in time) in that projection techniques yield the best pos- 
sible approximation within a given basis set. Moreover, while extrapolation techniques run 
the risk of generating a poor initial guess, this is not possible with the methods proposed 
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here, as the projection is guaranteed to reduce the error in a relevant norm provided that x n 
has some component in span{x u ~ l , . . . , x n_1 }. In the case where x n is not well represented 
in this space the residual will be unchanged. 

The proposed projection techniques are similar to reduced basis methods used in non- 
linear finite element problems [4,5]. However, the current methods can be implemented as 
“black boxes”, with calls to any iterative solver, “solve J\" and, for reasons to be discussed, 
the forward operator application, “multiply .by. A 11 . 

The outline of the paper is as follows. In Section 2 we describe two projection techniques 
for generating initial guesses x ~ x n based on l previous solutions x k , n — l < k < 
n — 1. In Section 3 we describe an application of the technique to the incompressible 

Navier-Stokes equations and in Section 4 we present performance results for several fluid 
dynamics calculations. 

2 Projection Methods 

2.1 Method 1 

We begin by assuming that we have stored a set of vectors Bi = {6j,...,6/} and solution 
vectors Xi = {x.j, . . . , xj} satisfying: 

Ax * = h k = {\ (2) 

Though not requisite, Bi and Xi are assumed to be derived from the l most recent problems, 
' P k , Jfc = n — ,n — 1, i.e., span{B t } = span{b n _ h . . . ,b n _ j). To simplify the orthogonal- 
ization, Bi is assumed to be orthonormal: 

<hih > = S *i ’ ( 3 ) 

where 5,j is the Kroenecker delta, and < . > is an appropriately weighted inner-product. 
The algorithm is based upon the following Gram-Schmidt procedure: 

At time level n, input fc n : (4) 

c*k =< k n ,~h >■> k = \,...,l 
4 «— 6 "- ZjXkk 

solve Ax = b to tolerance e 

X n 4 X + E OtkX k 

update { B[,Xi } 
return x n 
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The computation of b is simply standard Gram-Schmidt orthogonalization of b with re- 
spect to B{. The Of,’s are computed in a group prior to modifying 6". in order that, in a 
parallel computation in which 6” and > are distributed across P processors, the global 
communication required for the inner-products may be carried out in a single 0(log 2 P ) data 
exchange of an /-vector, at a cost commensurate with that of a standard inner-product. If 
necessary, it is possible to employ a more stable modified Gram-Schmidt procedure [3,6] for 
the computation of 6, at the expense of / individual vector reductions. However, we have 
not found this to be necessary in any application to date, probably due to the fact that 
the orthogonality condition (3) results from a Gram-Schmidt procedure (below) rather than 
from recursion as in the case of Krylov methods. 

To complete the procedure (4), we require a mechanism for updating the basis sets 
Initially, the sets are empty, and can be filled with the first / solutions and data. 
In fact, since b 1 b k , k = {1,...,/} by construction, the vector pair {/>,£} seems like a 
likely candidate to add to the basis set. However, this will not, in general, be stable because 
Ax = b is not satisfied exactly. This situation can be corrected by re-computing the required 
inhomogeneity, i.e., setting b = Ax, and enforcing (3) via a second Gram-Schmidt procedure. 
Additionally, we need a strategy for deciding which vectors to keep when the size of the basis 
set exceeds available memory capacity. There are several possibilities, e.g., retaining those 
vectors which repeatedly capture most of the energy in 6". Initial trials indicate that a 
reasonable approach is to save just the solution to the current problem, x n = x + 
which is a near optimal linear combination of elements in the current basis set. 

We summarize the update procedure as follows. If L is taken to be the maximum number 
of vector pairs to be stored, i.e., / < L, then at each time step: 

If {I = L ) then: b\ * — Ax n / 1 1 Axf \ | (5) 

Ii * — xfjWAg^W 

1= 1 

else: b < — Ax_ 

OLk — ^ b) bk ^5 ^ 1 , . . . 5 / 

&/+i ♦ — (k-Y^<xkh k )/\\h-'Lo'khk\\ 

£/+i ♦ — (£- £«*£*)/! |£- E«*frfell 
/ = / + 1 

endif 

Here, ||.|| = < . > 2 . The procedure re-initializes {B^Xi} with the most recent solution pair 
when the memory limits are exceeded, and then reconstructs a set which satisfies (2-3). 
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2.2 Method 2: ^-conjugate Projection 

The procedure of the preceding section follows the intuitive line of reasoning that if 6” is 
well approximated by l = 22<*jkj, then x n will be well approximated by x = Y2 a jXj. The 
degree of approximation can be quantified by noting that 6 is the L 2 projection of 6 n onto 
Bi, which implies that x is the best approximation to x" in X\ with respect to the A 2 -norm: 
1 1x11*2 =< Ax, Ax_ >2 . If A is symmetric positive definite and conjugate gradient iteration 
is employed, it is sensible to begin with a projection which minimizes the distance between 
x n and Xi in the A-norm, ||x||* =< x,Ax > 2 , since the conjugate gradient method seeks 
approximations which successively minimize the error in the A-norm [7]. 

The derivation of the resultant projection method is based upon a straightforward min- 
imization procedure. Assuming as before that we have a set of previous solution vectors 
X, = { xj, i = 1, . . . , /, we seek coefficients a; such that the approximation given by 

x = Y2 01 ^ ( 6 ) 

t=i 

minimizes the error in the ^4-norm: 

Ik n -I||^ = {x n ) T Ax n - 2 ^ a,(xf Ax n ) + ^2'52a i a j (z[AaL j ) . (7) 

i = 1 1=1 J =1 

The minimization procedure is simplified if we insist that the x,’s are A-conjugate and 
normalized to satisfy: 

xf Axj = Sij . (8) 


Requiring a vanishing first variation of (7) with respect to a, leads to: 


or, = x[ AxJ 1 = x[b n , i = 1 , . . . , / 


( 9 ) 


Thus, given a set of vectors X[ = {xj satisfying (8), the best approximation to x" is found 
by simply projecting 6 n onto Xi. This forms the kernel of Method 2: 


At time level n, input b n : (10) 

a; = xjk n , i = 1, • • ■ , J 

x « — a iZ 4 

bi — If -Ax 

solve Ax = 6 to tolerance e 
X " i — x + x 
update {AT;} 
return x n 
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Notice that the storage for this procedure is roughly half that of Method 1 as it only requires 
Xi , and not Bi. However, one additional A-multiply is required prior to the solve.A stage. 

As before, we need a mechanism to update the set X\. To satisfy (8) it is necessary 
project the most recent solution, x n , onto Xf and normalize the result. If we do not insist 
upon a modified Gram-Schmidt procedure, this can be done with a single multiply by A as 
follows. If L is taken to be the maximum number of vector pairs to be stored, i.e., / < L, 
then at each time level: 

If {l = L) then: ij < — x n l\\x n \\ A (11) 

l « — 1 

else: — xf Ax. , i = 1 , . . . , / 

£/+i « — E««i.)IU 

/ 4 / + 1 

endif 

Note that in (11) the required normalization satisfies ||(x — Z) a «^..)IU = (£ T Ax — ) 2 

due to the the A-conjugate relationship (8) and can therefore be computed with no additional 
A multiplies. 

3 Navier- Stokes Implementation 

We have implemented the above projection techniques in spectral-element solution of the 
incompressible Navier-Stokes equations: 

^ + U ' Vu = -Vp + t^V 2 u in H, (12) 

dt 

V • u = 0 in H, 

where u is the velocity vector, p the pressure, and Re = ^ the Reynolds number based on 
a characteristic velocity and length scale, and kinematic viscosity. 

Spatial discretization is based upon decomposition of the computational domain into K 
spectral elements which are locally mapped to [— l,l] d in Rf . Within each element, the 
geometry, solution, and data are expanded in terms of high-order tensor-product polynomial 
bases in each coordinate direction. Variational projection operators are used to discretize the 
elliptic equations arising from a semi-implicit treatment of (12) and a consistent variational 
formulation is used for the pressure/divergence treatment. The velocity is represented by 
AMi-order Lagrange polynomials on the Gauss-Lobatto-Legendre quadrature points, with 
C° continuity enforced at element interfaces. The pressure is represented by polynomials of 
degree N — 2 based upon the Gauss-Legendre quadrature points. Temporal discretization 
is based upon an operator splitting in which the nonlinear convective terms are treated 
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explicitly via a characteristic/sub-cycling scheme, and the viscous and divergence operators 
are treated implicitly. The discretization leads to the following linear Stokes problem to be 
solved at each time step: 

Hu, ~Djp = Bf., i = , (13) 

Di Mi = 0 

Here, H is the discrete equivalent of the Helmholtz operator, { — ^V 2 + ^7}; B is the mass 
matrix associated with the velocity mesh; D = (D\, ..., Dd ) is the discrete gradient operator; 
and underscore refers to basis coefficients. Further details of spectral element discretizations 
for the Navier-Stokes equations may be found in [8]. 

The solution of (13) is simplified by a Stokes operator splitting which decouples the 
viscous and pressure/divergence constraint [9]. This splitting leads to the solution of a 
standard Helmholtz equation for each velocity component, while the resulting system for 
the pressure is similar to (13) save that H is replaced by -£jB. The resulting system can be 
efficiently treated by formally carrying out block Gaussian elimination (Uzawa decoupling) 
for p, leading to: 

Ep = 9 , (14) 

where 

E = -'tD.B-'Dj, (15) 

1=1 

and g is the inhomogeneity resulting from the time-split treatment of (12). E corresponds to 
a consistent Poisson operator for the pressure and, though symmetric-positive definite, is less 
well conditioned than the Helmholtz problems for the velocity components. Consequently, 
solution of (14) dominates the Navier-Stokes solution time. The advantage of the Stokes 
splitting is that no system solves are required when applying E, as B is diagonal. 

The consistent Poisson problem (14) is solved via a two-level iteration scheme developed 
by Rpnquist [10] in which a coarse-grid operator is folded into a global conjugate-gradient 
iteration through deflation [11,12]. The coarse (subscript c) and fine (subscript f) decompo- 
sition is effected through a subdomain-motivated prolongation operator J € lV nxK , where 
m = K(N — l)'* is the number of pressure degrees-of- freedom. The column space of the 
prolongation operator J is intended to approximate the span of the low eigenmodes of the 
E system; for this particular problem, J maps element-piecewise-constant functions to the 
tyi nodes of the underlying spectral element discretization. The pressure is then expressed as 
p = Jp^. + £ } , leading to an algebraic reformulation of the original problem as solvable fine 
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and coarse subproblems, 


(16) 

(17) 


EfP f = 1 ~JE c ' jT 9i 

EcPc = J T g ~ J Te P s i 

respectively. Here Ej = E — EJE~ 1 J T E, and E c — J T EJ. The fine system (16) 
is solved by conjugate-gradient iteration (with appropriate orthogonality conditions). Once 
Pj is established, the coarse-grid problem is solved (directly) for p c , and the procedure is 
complete. With appropriate application of a local, element-based preconditioner to E / , the 
condition number of the fine system is significantly reduced relative to the originating E 
matrix. 

The projection methods of Section 2 are implemented at the level of equation (14), rather 
than being applied directly to (16). At each time step, we solve for the change in pressure 
Ap" = p n — p n_1 . Thus, in the notation of Section 2, we take A = E, x n = A p n , and 
b n = g n - Ep n - J . 

4 Results and Conclusion 

We first consider the problem of two-dimensional start-up flow past a cylinder at Re = = 

200. The discretization consists of K = 116 spectral elements of degree TV = 9 (m = 7424), 
with time step At = .0168, non-dimensionalized with respect to Uoo and D. The tolerance 
for the Z /2 norm of the pressure residual was set to 3 x 10 -6 , a value commensurate with the 
achievable discrete divergence of the resultant velocity field in 32-bit precision. 

In Fig. 1 we plot the required number of pressure iterations per step, Ne, for the cases 
L = 0, 2, and 20, using the ^-conjugate projection technique of Section 2.2. For clarity, 
a 50 step windowed average is presented. Over the non-dimensional time simulated, t = 0 
to 150, the flow passes through three transient regimes: symmetric wake formation, wake 
destabilization, and periodic (von Karman) vortex shedding. The first and third regimes are 
characterized by a high level of dynamic activity, while the second is relatively quiescent, 
as illustrated in the lower half of Fig. 1 by the time trace of u at a point in the near wake 
region of the cylinder. In flows devoid of dynamics, the pressure at time t n is well represented 
by p n_1 . Hence, little improvement results from incorporating information from more than 
one time step, as seen in the quiescent regime (t ~ 10 — 50). However, for flows having a 
richer dynamical structure, the enriched basis of the projection method provides potential 
for significant savings, as seen in the von Karman street regime in which a two-fold reduction 
in iteration count is attained for L = 20. Increasing the number of basis functions to L = 30 
brings no further significant reduction in this case. 
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Table 1 compares the average iteration count per step in the von Karman street regime for 
several values of Re and A t. In all cases, the Method 2 (L = 20) yields a slight improvement 
over Method 1 (L = 20) and roughly a fifty-percent reduction over the standard case (L = 
0). This is typical of the performance observed in other two- and three-dimensional flows 
of similar complexity. In large three-dimensional flows, the savings in pressure iterations 
typically translates into a fifty-percent reduction in CPU time [13]. 


Table I: Iteration count for flow past a cylinder. 


Re 

At 

Standard 

Meth. 1 

(ratio) 

Meth. 2 

(ratio) 

100 

0.01 

65 

45 

(.68) 

39 

(.59) 

200 

0.01 

90 

48 

(.53) 

43 

(.48) 

100 

0.04 

125 

65 (.52) 

61 

(.49) 

200 

0.04 

159 

89 

(■56) 

85 

(.53) 


As a second example, we consider the benchmark problem of computing the growth rate 
of small amplitude three-dimensional Tollmien-Schlichting (TS) waves in plane Poiseuille 
flow at Re = 1500, e.g. [14]. The domain consists of two-flat plates separated by a distance 
2 h, periodic boundary conditions in the streamwise and spanwise directions with periodicity 
lengths ‘2,'irh. The initial condition is a parabolic profile with unit centerline velocity, with 
a superimposed three-dimensional TS wave corresponding to the least damped eigenmode 
having amplitude 10 -4 and horizontal wave numbers a and of unity. For the spectral 
element calculation with K = 54, N = 7, and non-dimensional time step At = .00625, the 



Figure 1: Pressure iteration count and time history of velocity for impulsively started flow 
past a cylinder at Re = 200. 
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observed growth rate is Im{u se) = -.028273, compared to Im^ur) = -.028230 predicted 
by linear theory. The calculations were performed in 64-bit arithmetic and the pressure 
tolerance was set to 10 -13 in order to observe high-order spatial and second-order temporal 
convergence rates. 

In Fig. 2a we compare the required number of pressure iterations for the /1-conjugate 
projection method ( L = 80) to the standard case ( L = 0). For this problem, the projection 
method reduces the number of iterations from roughly 60 to as few as one per time step, with 
an average of 3.3. A peak of roughly 20 iterations results when the basis set is restarted, e.g., 
at step 83. The addition of more basis vectors does not further reduce the iteration count 
other than by reducing the frequency of restart. The corresponding pre-solver residual and 
Navier-Stokes solution times shown in Figs. 2b and 2c indicate respective fifty- and four- 
fold reductions. The computations were carried out on an eight-node Intel iPSC/860. We 
note that the savings attained is typical for this particular class of problems. However, the 
performance of the projection techniques for these convergence benchmarks is exceptional 
and does not reflect the reduction attained in more general engineering flows. 

Finally, we remark that the O(mL) memory requirement for the projection methods may 
at first seem quite high. However, this must be examined in the context of the application. 
First, the present application is for a general geometry Navier-Stokes solver, rather than just 
a linear equation solver. Consequently, the total memory requirements are already quite high, 
as it is necessary to store the grid coordinates, metrics, Jacobians, etc., as well as several 
scalar and vector fields. In addition, efficient iterative solvers generally require significant 
storage for preconditioners - with memory costs scaling at least as m. Thus, the relative in- 
crease in memory demanded by saving a set of basis vectors may not be prohibitive. Secondly, 
on dedicated distributed memory machines, these algorithms provide a classic example of 
superlinear speedup; for a problem of fixed size, increasing the number of processors results 
in increased memory, thus allowing an increased value of L and corresponding decrease in 
Me- Our preference is to regard this fact as a flaw in the fixed-problem-size parallel perfor- 
mance metric, rather than to claim that projection techniques are a pathway to super-linear 
parallel algorithms. It is nonetheless a classic example of a space-time trade-off which can 
have a very real impact in many circumstances. 
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Figure 2: Iteration count (a), initial residual (b), and eight-processor iPSC/860 CPU time 
(c) for the ^-conjugate projection technique applied to the three-dimensional Tollmien- 
Schlichting wave benchmark problem with K = 54 elements of order N — 1 . 
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